» 


00 

CD 

CD 

CNi 

CD 


■  ^  ''l 

I  <£ 


< 

I 

Q 

< 


}‘\  ' 


!*> 


K  .7. 


1>_ 

r.". 


unclassified 


SECURITY  CL  ASSiFtC  ATION  Of*  THIS  PAGE  Data  Entered) 


REPORT  DOCUMENTATION  PAGE 


1  REPORT  NUMBER 

^  85  -  12  -  02 


2.  COVT  ACCESSION  NO 


4  TiTL  E  (and  Subtitle) 

Automated  Generation  of  Microcontrollers 


7.  AuTHORflJ 


Barry  W.  Jinks  ,  David  L.  Pulfrey,  Warren  S. 

Snyder 


i.  PERFORMING  ORGANIZATION  NAME  AND  ADDRESS 

UW/NW  VLSI  Consortium 

Computer  Science  Department,  FR-35 

University  of  Washington,  Seattle,  W/  98195 


11.  CONTROLLING  C  F  F  ICE  NAME  AND  ADDRESS 

DARPA  -  I! ’TO 

1400  Wilson  Boulevard 


READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 


3.  RECIPIEN  T’S  CAT  ALOG  NUMBER 


5  TYPE  OF  REPORT  A  PERIOD  COVERED 

Technical,  interim 


6  PERFORMING  ORG.  REPORT  NUMBER 


•  CONTRACT  OR  GRANT  NUMBERS 

MDA903-85-K007  2 
ARPA-4563,  /' 2,  code  5D30. 


10.  PROGRAM  ELEMENT.  PROJECT.  TASK 
AREA  *  WORK  UNIT  NUMBERS 


12.  REPORT  DATE 

December  1985 


13.  NUMBER  OF  PAGES 


14  MONITORING  AGENCY  NAME  0  AODRESSf//  dlllarant  from  Controlling  Office) 

ONR 

University  of  Washington 

15.  SECURITY  CLASS,  (ot  thle  report) 

unclassified 

315  University  District  Building 

1107  NE  45th  St.,  JD-16.  Seattle,  WA  98195 

15a.  DECLASSI  FI  CATION/ DOWN  GRADING 
SCHEDULE 

16-  DISTRIBUTION  STATEMENT  (of  thla  Report) 

Distribution  of  this  report  is  unlimited. 

17.  DISTRIBUTION  STATEMENT  (ot  the  abatract  entered  In  Block  20,  It  dlfierent  trom  Report)  'll1  7  & — 

'lJ  i  iL> 
^ELECTE^ 

_ DEC  2  7  VL. 

10.  SUPPLEMENTARY  NOTES 

^3  w 

19  KEY  WORDS  ( Contlnua  on  revaraa  aide  It  nacaaaary  and  Identity  y  block  numbar) 

microcontrollers,  CFL,  MAGIC,  RNL ,  NETLIST,  BILBO,  RAM,  PLA,  ALU,  RISC, 

CMOS,  NORA 

20  ABSTRACT  ( Contlnua  on  ravarao  alda  it  nacaaaary  and  Identity  by  block  number) 

--’The  concept  of  an  algorithmic  microcontroller  has  been  investigated. 
Software  has  been  created  which,  when  supplied  with  design  parametrics, 
will  generate  several  representations  of  the  desire  instance.  This  paper 
discusses  the  methodology  followed  during  generator  creation  and  the 
architecture  and  instruction  set  of  the  resulting  family  of  controllers. 


DD  FORM 
VU  1  JAN  73 


1473 


EDITION  OF  1  NOV  65  IS  OBSOLETE 

S/N  0102-LF.014-6601 


unclassified 

SECURITY  CLASSIFICATION  OF  THIS  PAGE  (»Ti*n  Data  Bntarad) 


fZ  f  E  **  - 


Automated  Generation  of  Microcontrollers 


Barry  W.  Jinks*  —  David  L.  Palfrey**  —  Warren  S.  Snyder*** 


Department  of  Computer  Science 
Seattle,  W'ashington  98195 
Technical  Report  85-12-02 
December,  1985 


*  Nlicrotel  Pacific  R.  search  Liaison,  UW/NW  VLSI  Consortium,  Seattle,  WA98195 

*  Dept,  of  Electrical  Engineering,  U.B.C.,  Vancouver,  B.C.  V6T  1W5 
**  GTE  Laboratories,  40  Sylvan  Road,  Waltham,  MA  02254 


ABSTRACT 

The  concept  of  an  algorithmic  microcontroller  has  been  investigated.  Software  has  been 
created  which,  when  supplied  with  design  parametrics,  will  generate  several  representations 
of  the  desired  instance.  This  paper  discusses  the  methodology  followed  during  generator 
creation  and  the  architecture  and  instruction  set  of  the  resulting  family  of  controllers. 


Funded  in  part  by  the  Defense  Advanced  Research  Projecti  Agency  under  Contract  MDA903-8S*K*0072. 


■■■ 


85  12  27  027 


nv 


^  is  *• 


Automated  Generation  of  Microcontrollers 


Barn  W  Jinks*  —  David  L.  Pulfrey**  —  Wureo  S  Snyder*** 


•  Microtel  Pacific  Research  Liaison.  UW/NW  VLSI  Consortium.  Seattle,  WA98195 
'*  Dept,  of  Electrical  Engineering,  U.B.C.,  Vancouver,  B  C.  V6T  1W5 
***  GTE  Laboratories,  40  Sylvan  Road,  Waltham,  MA  02254 


ABSTRACT -The  concept  of  an  algorithmic 
microcontroller  has  been  investigated.  Software  has  been 
created  which,  when  supplied  with  design  parametric*.  will 
generate  several  representations  of  the  derired  instance. 
This  paper  discusses  the  methodology  followed  during 
generator  creation  and  the  architecture  and  instruction  set 
of  the  resulting  family  of  controllers. 


I  INTRODUCTION 

The  use  of  microcomputers  as  on -chip  ouilding-blocks 
is  an  attractive  proposition  for  des^nen  wishing  to  realize 
large  integrated  systems  in  nlicon.  Various  approaches  to 
this  end  are  being  explored,  including  standard  library 
macrocellt,  silicon  compilers  and  microcomputer 
generators.  In  a  recent  embodiment  of  the  latter 
approach1,  the  architecture  of  the  microcomputer 
macrocells  is  essentially  fixed  but  the  user  baa  control  over 
important  parameters  such  as  data  width,  numbers  of 
tegisters  and  the  memory  content  and  size.  In  the  present 
work  we  have  taken  this  concept  a  step  further  along  the 
road  to  flexibility  and  area  minimization  by  adopting  a  fully 
parametric  approach  to  the  dengn  of  a  microcontroller 

By  focusing  on  the  microcontroller,  rather  than  on  a 
general  purpose  microcomputer,  we  have  been  able  to  limit 
the  global  instruction  set  and  hence  reduce  the  complexity 
of  the  generator  design  The  microcontroller  generator  is  a 
software  design  environment  which  consists  of  a  suite  of 
subprograms  capable  of  producing  a  number  of  different 
data  base  n  presentations  of  a  given  instance.  When 
furnished  witc  the  final  system  parametrics,  the  system 
synthesizes  the  mask  geometries  for  a  microcontroller  to  be 
instantiated  in  a  user-specified  system. 

The  microcontroller  comprises  a  microprocessor, 
memory,  communications  protocol  hardware  and  analog 
and  digital  I/O.  It  is  intended  to  form  a  complete  control 
system  on  a  chip  for  use  in  telecommunications 
applications.  The  present  work  focuses  on  the 
microprocessor  and  memory  portions  of  this  generator. 

H.  generator  development  methodology 

Initial  system  design  and  functional  level  simulation 
was  performed  usiog  a  behavioral  level  simulator,  after 
which  a  LISP-like  description  of  the  circuit,  including 
parasitics,  was  created  with  MIT's  NETLIST  program.  Leaf 
cells  were  then  laid  out  using  MAGIC,  the  new  layout 

fFuode^  is  part  by  ihe  Dcfeaat  Advanced  lUaurct  Projects  Agency 
uodcr  Contract  MD  A 90*45- K -0072. 


editor  from  Berkeley  Pomtioning  of  the  leaf  cells  is 
performed  using  Coordinate  Free  LAP  (CFL),  a  program 
developed  at  the  UW/NW  VLSI  Consortium  2. 

CFL  was  designed  with  generator  creation  in  mind  As 
such  it  allows  the  designer  to  write  a  -C*  program  which 
embodies  the  general  structure  of  the  generator,  even 
before  the  leaf  cells  have  been  designed  At  these  cells  are 
created,  or  changed.  CFL  automatically  aligns  them  in  (he 
correct  fashioe.  CFL  also  provides  the  designer  with 
autorouting  primitive*  aad,  further,  creates  n  complete 
description  of  the  border  of  the  aewly-gencrated  device. 
This  enables  manipulation  of  the  device  by  any  higher  level 
program  to  proceed  by  accessing  only  a  very  small  amount 
of  data. 

Once  the  layout  phase  was  complete,  the  instances  were 
extracted  using  MAGIC.  Simulations  with  the  switch-level 
simulator  RNL  were  then  performed  to  compare  results 
with  the  tranristor  netliat,  following  which  changes  to 
NETLIST.  the  leaf  cells  and  CFL  were  made  to  ensure 
agreement.  Layout  verification,  and  any  further  necessary 
model  adjustments  were  then  made  prior  to  development  of 
the  test  procedure.  A  self-test  methodology  based  on 
BILBO3  is  currently  being  developed  This  will  be  driven 
by  the  same  input  parameters  as  used  by  the  CFL  and 
NETLIST  programs  such  that  the  result  in  a  signature 
register  (see  Figure  1)  can  be  predicted  and  rsed  to  detect 
simulated  faults. 

The  complete  data  bate  representation  of  a  generated 
microcontroller  comprises  four  files,  namelv  a  simulation 
file  incorporating  the  transistor  netlist,  the  mask 
information  file  containing  the  layout  geometry  information 
and  design  rule  checker  information  i-om  MAGIC,  a 
border  file  giving  the  generator  footprint  from  CFL,  and  a 
test  file  containing  the  test  data. 

A  satisfactory  degree  of  geometric  rule  independence  is 
achieved  by  creating  a  master  rule  set  from  the  rules  of 
foundries  likely  to  be  employed  ia  the  microcontroller 
fabrication.  The  leaf  cells  are  designed  from  this  rule  set 
using  a  graphical  layout  editor.  This  approach  offers  an 
alternative  to  the  algorithmic  technology  file  approach 
proposed  elsewhere4.  At  our  intended  applications  for 
microcontrollers  are  in  the  telecommunications  held,  both 
analog  and  digital  circuitry  are  likely  to  be  required  on  the 
chip.  This  limits  the  suitable  fabrication  technologies  to 
those  that  use  only  single  layer  metal  since  'double- poly, 
double-metal*  processes  are  not  widely  available. 

UL  DESIGN  OBJECTIVES  AND  PRIORmES 

In  establishing  priorities  for  the  design,  the  following 
axioms  were  considered  to  be  appropriate  to  control 
systems  in  telecommunications:  i)  Control  algorithms  tend 
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to  concentrate  on  bit  manipulation  operations  rather  than 
arithmetic  operations;  u)  the  quantity  of  transient  (te. 
read,  write)  data  is  relatively  small  and  the  data  structures 
are  simple,  tu)  control  processors  need  not  be  high 
performance  machines,  iv)  system  timing  is  of  critical 
importance  in  control  applications. 

Accordingly,  it  was  determined  that  the  priorities 
should  be  ordered  as  follows:  size,  simplicity  and 
performance.  To  keep  the  design  small,  considerable  effort 
was  devoted  to  reducing  the  sue  of  the  largest  elements 
(i.e  RAM.  PLA,  and  ALU).  To  simplify  the  architecture, 
a  reduced  instruction  set  similar  to  RISC5  was  chosen.  This 
was  further  simplified  by  treatiog  all  data  as  globals  which 
reside  in  the  same  memory  space,  so  eliminating  the  need 
for  memory  reference  instructions,  in  addition,  the  number 
of  control  lines  has  been  kept  low  to  reduce  bus  routing 
problems.  To  maintain  timing  consistency,  a  constant  cycle 
time  for  all  instructions  (including  branches)  was  chosen. 
Performance  issues  were  addressed  in  hardware  by 
employing:  (1)  sub-generators  with  an  algorithmic  driver 
sizing  capability6,  (2)  a  modest  pipeline  and  (3)  separate 
data  and  control  spaces.  The  resulting  architecture  is 
depicted  in  Figure  1. 


The  RAM  output,  consisting  of  the  contents  of 
registers.  I/O  data  or  immediates  (which  have  passed 
through  it)  form  the  input  to  the  ALU  and  shifter  These 
elements  perform  then  required  operation  and  force  the 
result  back  to  the  register  file  on  the  Bbus  only 

The  RAM  is  optimized  for  speed  and  it  is  noteworthy 
that  the  slowest  devices,  namely  the  ALU  (slow  carry 
chain)  and  PLA  (high  capacitance  on  the  term  lines)  are 
able  to  operate  on  cycle  times  which  are  one  half  that  of 
the  RAM.  The  RAM,  ALU  and  shifter,  which  together 
form  the  data  path,  are  pipelined  so  that  one  is  prechaigmg 
while  the  other  is  evaluating. 

The  data  path  output  may  consist  of  branch  addresses 
which  form  the  input  to  the  program  counter.  A  PLA  is 
used  to  compare  the  flags  with  the  instruction  condition 
code.  If  a  subroutine  call  it  to  be  executed,  the  current 
program  counter  and  flap  are  pushed  on  the  stack. 

Interrupts  are  controlled  by  the  interrupt  handler 
When  an  interrupt  is  being  requested;  the  associated  vector 
is  forced  onto  the  current  P  C  bus.  This  causes  a  jump  to 
the  state  containing  a  call  to  the  appropriate  interrupt 
service  routine. 
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Since  both  program  and  processor  are  contained  within 
the  same  block,  the  system  can  be  viewed  as  a  finite  state 
machine  with  the  state  feedback  being  the  current  program 
counter.  A  microcoding  technique  employing  PLAs  is  used 
to  produce  the  control  and  data  path  signals  in  parallel. 
Separation  of  these  paths  allows  PLA  minimization 
techniques  to  produce  a  more  compact  structure.  The 
output  of  the  PLAs  remains  valid  for  an  entire  instruction 
cycle,  see  Figure  2. 

System  timing  is  based  on  a  two-phase  non-overlapping 
clock  scheme,  with  one  machine  cycle  taking  two  ticks  of 
each  clock  to  complete.  The  machine  cycle  is  partitioned 
into  read  and  write  half-cycles  by  a  state  clock  derived 
from  one  of  the  system  clock  phases,  see  Figure  2.  The 
RAM  must  be  read  in  the  first  half-cycle  and  wrinen-to  in 
the  second  half-cycle.  The  output  of  the  RAM  is 
dynamically  latched  by  C2MOS  latches,  using  NORA7 
circuit  techniques.  The  RAM  itself  s  implemented  ia 
domino  CMOS. 
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FIGURE  2.  SYSTEM  TIMING 

V.  GENERATOR  INPUT  PARAMETERS 

The  9  lower  level  blocks  of  the  controller  are 
synthesized  by  8  unique  sub-generators.  The  input 
parameters  to  these  generators  are  the  register  address 
width,  aumber  of  interrupts,  number  of  i/o  pons,  program 
counter  width,  stack  depth  and  data  path  width  In 
addition,  the  ALU,  shifter  and  branch  PLA  operations  are 
defined.  Many  of  these  parameters  can  be  synthesized  by 
analysis  of  the  instructions  used  in  a  particular  program. 
This  ensures  a  compact  microcontroller  which  is  capable  of 
executing  a  local  subset  only  of  the  global  set  available. 

The  ALU  is  an  example  of  where  this  approach  to  area 
reduction  is  used.  In  addition  to  the  data  path  width,  the 
ALU  generator  has  a  switch  which  directs  it  to  synthesize 
an  'arithmetical/logical'  or  just  a  ’logical*  unit,  depending 
on  the  particular  operations  required.  The  shifter 
generator  works  in  a  similar  way  and  synthesizes  only  the 
actual  paths  used  in  the  crossbar  switch,  instead  of  all  those 
that  are  possible. 

VI.  INSTRUCTION  SET 

The  global  instruction  set  has  Keen  devised  to  provide  a 
limited  but  powerful  set  of  instructions.  All  instructions 
execute  in  one  machine  cycle  and  there  are  22  instruction 
types,  see  Figure  3. 

All  of  the  read/write  registers  in  the  processor  reside  in 
the  dual  ported  register  file.  The  i/o  address  space  is  also 
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memory  mapped,  therefore,  there  are  no  divisions  between 
registers,  memory  and  I/O  pom.  This  greatly  simplifies  the 
instruction  set  while  making  it  completely  orthogonal. 
Several  special  (x)  latches  may  also  reside  in  the  RAM. 
These  allow  todirect  addresses  to  be  stored  and  used  to 
access  the  registers  Since  the  RAM  is  designed  to  have  A 
and  B  buses  read  simultaneously  but  a  write  to  the  B  bus 
only,  ail  data  path  operations  are  of  the  form: 

(Reg  1/data/ io)  OP  (Reg2/io)~ >  (Reg2/io) 

The  source  of  jump  or  call  addreaes  is  the  RAM  as 
well,  thus  addresses  may  be  immediate  or  computed.  At 
present.  8  branch  conditions  are  provided  for. 

By  keeping  the  number  of  control  lines  small  (  <-14  ), 
it  has  proved  possible  to  implement  horuootaJ  microcode 
(i.e.  no  instruction  decode),  producing  in  this  case  784 
unique  instructions.  This  provides  maximum  flexibility 
while  maintaining  architectural  simplicity.  The  result  is 
that  operations  which  do  not  use  the  data  path  (i.e.  flags, 
interrupt  control,  return)  can  be  executed  in  parallel  with 
those  that  do. 

DATA  PATH  OPERATIONS: 


Figure  4.  Check  Plot 


operand  pairs: 

operations: 

data,  reg 

ADO 

SHLL 

reql.  reg 2 

ADQC 

SHLA 

•x  .  reg 

SUB 

SHRL 

reg  ,  *< 

SUBC 

SHRA 

data,  *x 

AND 

Rll 

k  ,  reg 

OR 

RLA 

reg  ,  x 

X0R 

RRl 

MOVE 

RRA 

*<  =  an  indirect  address 


OTHER  OPERATIONS: 
rLAGS  LATCH 

CALL  ON  condition 

RETURN  ON  CONDITION 
JUMP  ON  CONDITION 
COMPARE 

ENABLE/DISABLE  INTERRUPT 


FIGURE  3.  GLOBAL  INSTRUCTION  SET 


YiH  CONCLUSION 


A  design  environment  for  the  automatic  generation  of 
microcontrollers  in  single  chip  telecommunications 
applications  has  bee"  developed.  When  the  input 
parameters  are  supplier  ie  generator  synthesizes  the  mask 
geometries  for  the  required  microcontroller.  By  analyzing 
the  instructions  used  in  a  particular  program,  layouts  of 
major  components  such  as  the  ALU  and  the  barrel  shifter 
are  defined.  In  this  manner,  a  compact  layout  is  produced 
for  each  different  microcontroller  instance.  Several 
microcontrollers  with  data-patbs  ranging  from  2  to  16  bits 
have  been  simulated.  The  corresponding  machine  cycle 
times  are  predicted  to  be  180  to  600  nanoseconds 
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Vll.  RESULTS 

The  generator  has  been  designed  in  3»a  CMOS,  using  a 
conservative  set  of  he  MOS1S  design  rules.  Several 
instances  of  the  mici  xontrouer  have  been  simulated  using 
NETLIST  in  conjunction  with  RNL.  This  has  shown  that 
the  speed  of  a  particular  instance  is  highly  dependent  on 
the  resources  used.  For  example,  for  moderate  word 
widths,  the  ALU  carry  chain  delay  dominates.  If  the  carry 
chain  is  not  present  (in.  logical  operations  only),  the 
program  counter  becomes  the  speed  determining  factor. 
Mon  instantiations  are  expected  to  perform  with  a 
processing  rate  better  than  2  million  instruction s/secood. 

Figure  4  is  the  check  plot  of  a  test  instance  with  an  8 
bit  word  width,  31  general  purpose  registers,  1  indirect 
address  register  and  an  8  deep  nick.  Each  subgenerator  is 
fully  implemented  (in.  all  instruction  types  can  be 
executed)  and  the  program  is  28C  steps  in  length.  The  pad 
frame  has  a  cavity  which  is  55ma  oe  a  aide. 

Some  of  the  major  data  path  items,  notably  the  RAM 
and  the  shifter  have  already  been  fabricated.  Initial  testing 
shows  that  the  functionality  and  performance  track  well 
with  the  expected  results. 
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